Generating Training Data for Learning Linear Composite Dispatching Rules for Scheduling
نویسندگان
چکیده
A supervised learning approach to generating composite linear priority dispatching rules for scheduling is studied. In particular we investigate a number of strategies for how to generate training data for learning a linear dispatching rule using preference learning. The results show, that when generating a training data set from only optimal solutions, it is not as effective as when suboptimal solutions are added to the set. Furthermore, different strategies for creating preference pairs is investigated as well as suboptimal solution trajectories. The different strategies are investigated on 2000 randomly generated problem instances using two different problem generator settings. When applying learning algorithms, the training set is of paramount importance. A training set should have sufficient knowledge of the problem at hand. This is done by the use of features which are supposed to capture the essential measures of a problem’s state. For this purpose, the job-shop scheduling problem (JSP) is used as a case study to illustrate a methodology for generating meaningful training data which can be successfully learned. JSP deals with the allocation of tasks of competing resources where the goal is to minimise a schedule’s maximum completion time, i.e., the makespan denoted Cmax. In order to find good solutions, heuristics are commonly applied in research, such as the simple priority based dispatching rules (SDR) from [11]. Composites of such simple rules can perform significantly better [6]. As a consequence, a linear composite of dispatching rules (LCDR) was presented in [3]. The goal there was to learn a set of weights, w, via logistic regression such that h(xj) = ⟨ w · φ(xj) ⟩ , (1) yields the preference estimate for dispatching job j that corresponds to postdecision state xj , where φ(xj) denotes its feature mapping. The job dispatched is the following, j∗ = argmax j {h(xj)} . (2) The approach was to use supervised learning to determine which feature states are preferable to others. The training data was created from optimal solutions of randomly generated problem instances. An alternative would be minimising the expected Cmax by directly using a brute force search such as CMA-ES [2]. Preliminary experiments were conducted in [5], which showed that optimising the weights in Eq. (1) via evolutionary search actually resulted in a better LCDR than the previous approach. The nature of the CMA-ES is to explore suboptimal routes until it converges to an optimal route. This implies that the previous approach, of restricting the training data only to one optimal route, may not produce a sufficiently rich training set. That is, the training set should incorporate a more complete knowledge of all possible preferences, i.e., it should make the distinction between suboptimal and sub-suboptimal features, etc. This approach would require a Pareto ranking of preferences which can be used to make the distinction of which feature sets are equivalent, better or worse – and to what degree, e.g. by giving a weight to the preference. This would result in a very large training set, which of course could be re-sampled in order to make it computationally feasible to learn. In this study we will investigate a number of different ranking strategies for creating preference pairs. Alternatively, training data could be generated using suboptimal solution trajectories. For instance [7] used decision trees to ‘rediscover’ largest processing time (LPT, a single priority based dispatching rule) by using LPT to create its training data. The limitations of using heuristics to label the training data is that the learning algorithm will mimic the original heuristic (both when it works poorly and well on the problem instances) and does not consider the real optimum. In order to learn heuristics that can outperform existing heuristics, then the training data needs to be correctly labelled. This drawback is confronted in [8,15,10] by using an optimal scheduler, computed off-line. In this study, we will both follow optimal and suboptimal solution trajectories, but for each partial solution the preference pair will be labelled correctly by solving the partial solution to optimality using a commercial software package [1]. For this study most work remaining (MWR), a promising SDR for the given data distributions [4], and the CMA-ES optimised LCDRs from [5] will be deemed worthwhile for generating suboptimal trajectories. To summarise, the study considers two main aspects of the generation of training data: a) how preference pairs are added at each decision stage, and b) which solution trajectorie(s) should be sampled. That is, optimal, random, or suboptimal trajectories, based on a good heuristic, etc. The outline of the paper is as follows, first we illustrate how JSP can be seen as a decision tree where the depth of the tree corresponds to the total number of job-dispatches needed to form a complete schedule. The feature space is also introduced and how optimal dispatches and suboptimal dispatches are labelled at each node in the tree. This is followed by detailing the strategies investigated in this study by selecting preference pairs ranking and sampling solution trajectories. The authors then perform an extensive study comparing these strategies. Finally, this paper concludes with discussions and a summary of main results. Table 1: Problem space distributions, P. name size (n×m) Ntrain Ntest note Pj.rnd 6× 5 500 500 random Pj.rndn 6× 5 500 500 random-narrow Table 2: Feature space, F . φ Feature description φ1 job processing time φ2 job start-time φ3 job end-time φ4 when machine is next free φ5 current makespan φ6 total work remaining for job φ7 most work remaining for all jobs φ8 total idle time for machine φ9 total idle time for all machines φ10 φ9 weighted w.r.t. number of assigned tasks φ11 time job had to wait φ12 idle time created φ13 total processing time for job 1 Problem Space In this study synthetic JSP data instances are considered with the problem size n×m, where n and m denotes number of jobs and machines, respectively. Problem instances are generated stochastically. By fixing the number of jobs and machines while processing time are i.i.d. samples from a discrete uniform distribution from the interval I = [u1, u2], i.e., p ∼ U(u1, u2). Two different processing time distributions are explored, namely Pj.rnd where I = [1, 99] and Pj.rndn where I = [45, 55] are referred to as random and random-narrow, respectively. The machine order is a random permutation of all of the machines in the job-shop. For each data distribution Ntrain and Ntest problem instances were generated for training and testing, respectively. Values for N are given in Table 1. Note, that difficult problem instances are not filtered out beforehand, such as the approach in [16]. 2 JSP tree representation When building a complete JSP schedule l = n · m dispatches must be made consecutively. A job is placed at the earliest available time slot for its next machine, whilst still fulfilling constraints that each machine can handle, which is at most one job at each time, and jobs need to have finished their previous machines according to its machine order. Unfinished jobs, referred to as the ready-list denoted R, are dispatched one at a time according to a heuristic. After each dispatch, the schedule’s current features are updated based on its resulting partial schedule. For each possible post-decision state the temporal features, F , applied in this study are given in Table 2. These features are based on SDRs which are widespread in practice. For example if w is zero, save for w6 = 1, then Eq. (1) gives h(xj) > h(xi), ∀i which are jobs with less work remaining than job j, namely Eq. (2) yields the job with the highest φ6 value, i.e., equivalent to dispatching rule most work remaining (MWR). Figure 1 illustrates how the first two dispatches could be executed for a 6×5 JSP with the machines a ∈ {M1, ...,M5} on the vertical axis and the horizontal axis yields the current makespan, Cmax. The next possible dispatches are denoted as dashed boxes with the job index j within and its length corresponding to processing time pja. In the top layer one can see an empty schedule. In the middle layer one of the possible dispatches from the layer above is fixed (depicted solid) and one can see the resulting schedule (i.e. what are the next possible dispatches given this new scenario?). Finally, the bottom layer depicts all outcomes if job J3 on machine M3 would be dispatched. This sort of tree representation is similar to game trees [9] where the root node denotes the initial (i.e. empty) schedule and the leaf nodes denote the complete schedule. Therefore, the distance k from an internal node to the root yields the number of operations already dispatched. Traversing from root to leaf node, one can obtain a sequence of dispatches that yielded the resulting schedule, i.e., the sequence indicates in which order the tasks should be dispatched for that particular schedule. However, one can easily see that this sequence of task assignments is by no means unique. Inspecting a partial schedule further along in the dispatching process such as in Fig. 1 (top layer), then let’s say J1 would be dispatched next, and in the next iteration J2. This sequence would yield the same schedule as if J2 would have been dispatched first and then J1 in the next iteration (since these are non-conflicting jobs). This indicates that some of the nodes in the tree can merge despite states of the partial schedules being different in previous layers. In this particular instance one can not infer that choosing J1 is better and J2 is worse (or vice versa) since they can both yield the same solution. Furthermore, in some cases there can be multiple optimal solutions to the same problem instance. Hence not only is the sequence representation ‘flawed’ in the sense that slight permutations on the sequence are in fact equivalent w.r.t. the end-result, but varying permutations on the dispatching sequence (given the same partial initial sequence) can result in very different complete schedules with the same makespan, and thus same deviation from optimality, ρ defined by Eq. (4), which is the measure under consideration. Care must be taken in this case that neither resulting features are labelled as undesirable or suboptimal. Only the resulting features from a dispatch resulting in a suboptimal solution should be labelled undesirable.
منابع مشابه
New scheduling rules for a dynamic flexible flow line problem with sequence-dependent setup times
In the literature, the application of multi-objective dynamic scheduling problem and simple priority rules are widely studied. Although these rules are not efficient enough due to simplicity and lack of general insight, composite dispatching rules have a very suitable performance because they result from experiments. In this paper, a dynamic flexible flow line problem with sequence-dependent se...
متن کاملEvolutionary Learning of Weighted Linear Composite Dispatching Rules for Scheduling
A prevalent approach to solving job shop scheduling problems is to combine several relatively simple dispatching rules such that they may benefit each other for a given problem space. Generally, this is done in an ad-hoc fashion, requiring expert knowledge from heuristics designers, or extensive exploration of suitable combinations of heuristics. The approach here is to automate that selection ...
متن کاملIntelligent Scheduling with Machine Learning Capabilities: The Induction of Scheduling Knowledges
Dynamic scheduling of manufacturing systems has primarily involved the use of dispatching rules. In the context of conventional job shops, the relative performance of these rules has been found to depend upon the system attributes, and no single rule is dominant across all possible scenarios. This indicates the need for developing a scheduling approach which adopts a state-dependent dispatching...
متن کاملManufacturing Systems Scheduling through Machine Learning
The problem of manufacturing systems scheduling by means of dispatching rules is that these rules depend on the state in which the system is in every moment. Therefore it would be interesting to use in every state of the system, the most adequate dispatching rule to that state. To achieve this goal, it is presented in this paper a scheduling approach which uses machine learning. This approach, ...
متن کاملLearning dispatching rules via an association rule mining approach
This thesis proposes a new idea using association rule mining-based approach for discovering dispatching rules in production data. Decision trees have previously been used for the same purpose of finding dispatching rules. However, the nature of the decision tree as a classification method may cause incomplete discovery of dispatching rules, which can be complemented by association rule mining ...
متن کاملDesigning Dispatching Rules to Minimize Total Tardiness
We approximate optimal solutions to the Flexible Job-Shop Problem by using dispatching rules discovered through Genetic Programming. While Simple Priority Rules have been widely applied in practice, their efficacy remains poor due to lack of a global view. Composite Dispatching Rules have been shown to be more effective as they are constructed through human experience. In this work, we employ s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015